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Abstract — We develop an Accelerated Back Pressure (ABP) 
algorithm using Accelerated Dual Descent (ADD), a distributed 
approximate Newton-like algorithm that only uses local information. 
Our construction is based on writing the backpressure algorithm as 
the solution to a network feasibility problem solved via stochastic 
dual subgradient descent. We apply stochastic ADD in place of 
the stochastic gradient descent algorithm. We prove that the ABP 
algorithm guarantees stable queues. Our numerical experiments 
demonstrate a significant improvement in convergence rate, espe- 
cially when the packet arrival statistics vary over time. 

I. Introduction 

This paper considers the problem of joint routing and schedul- 
ing in packet networks. Packets are accepted from upper layers 
as they are generated and marked for delivery to intended 
destinations. To accomplish delivery of information nodes need 
, to determine routes and schedules capable of accommodating the 
generated traffic. From a node-centric point of view, individual 
nodes handle packets that are generated locally as well as packets 
received from neighboring nodes. The goal of each node is 
to determine suitable next hops for each flow conducive to 
successful packet delivery. 

A joint solution to this routing and scheduUng problem is 
' offered by the backpressure (BP) algorithm [1]. In BP nodes keep 
track of the number of packets in their local queues for each 
flow and share this information with neighboring agents. The 
differences in the number of queued packets at two neighboring 
terminals are computed for all flows and the transmission capacity 
of the link is assigned to the flow with the largest queue 
differential. We can interpret this algorithm by identifying queue 
differentials with pressure to send packets on a link. Regardless 
of interpretation and despite its simpUcity, BP can be proved to be 
an optimal policy if given arrival rates can be supported by some 
routing-scheduling policy, they can be supported by BP. Notice 
that BP relies on queue lengths only and does not need knowledge 
or estimation of packet arrival rates. It is also important to 
recognize that BP is an iterative algorithm. Packets are initially 
routed more or less at random but as queues build throughout the 
network suitable routes and schedules are eventually learned. 

A drawback of BP is the slow convergence rate of this iterative 
process for route discovery. This is better understood through an 
alternative interpretation of BP as a dual stochastic subgradient 
descent algorithm [2], [3]. Joint routing and scheduling can be 
formulated as the determination of per-flow routing variables that 
satisfy link capacity and flow conservation constraints. In this 
model the packet transmission rates are abstracted as continuous 
variables. To solve the resulting feasibility problem we associate 
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Lagrange multipliers with the flow conservation constraints and 
proceed to implement subgradient descent in the dual domain. 
This solution methodology leads to distributed implementations 
and for that reason is commonly utilized to find optimal operating 
points of wired [4]-[6] and wireless communication networks 
[7J-[9J. More important to the discussion here, the resulting 
algorithm turns out equivalent to BP with queue lengths taking 
the place of noisy versions of the corresponding dual variables. 
The slow convergence rate of BP is thus expected because the 
convergence rate of subgradient descent algorithms is logarithmic 
in the number of iterations [10]. Simple modifications can speed 
up the convergence rate of BP by rendering it equivalent to 
stochastic gradient descent [11] - as opposed to stochastic sub- 
gradient descent. Nevertheless, the resulting linear convergence 
rate is still a limiting factor in practice. 

To speed up the convergence rate of BP we need to incorporate 
information on the curvature of the dual function. This could 
be achieved by using Newton's method, but the determination 
of dual Newton steps requires coordination among all nodes in 
the network. To overcome this Umitation, methods to determine 
approximate Newton steps in a distributed manner have been 
proposed. Early contributions on this regard are found in [12], 
[13]. Both of these. Both of these methods, however, are not fully 
distributed because they require some level of global coordina- 
tion. Efforts to overcome this shortcoming include approximating 
the Hessian inverse with the inverse of its diagonals [14], the use 
of consensus iterations to approximate the Newton step [15], [16], 
and the accelerated dual descent (ADD) family of algorithms 
that uses a Taylor's expansion of the Hessian inverse to compute 
approximations to the Newton step [17]. Members of the ADD 
family are indexed by a parameter N indicating that local Newton 
step approximations are constructed at each node with informa- 
tion gleaned from agents no more than N hops away. Algorithms 
ADD-7V were proven to achieve quadratic convergence rate and 
demonstrated to reduce the time to find optimal operating points - 
as measured by the number of communication instances - by two 
orders of magnitude with respect to regular subgradient descent. 
It has been shown that stochastic ADD converges for the Network 
flow optimization in [18]. The goal of this paper is to adapt the 
ADD family of algorithm to develop viiriations of BP that achieve 
quadratic convergence. 

We start the paper introducing the problem of stabilizing 
queues in a communication network and review the backpressure 
algorithm, [1] used to solve this problem (Section 11). We proceed 
to demonstrate that the backpressure algorithm is equivalent 
to stochastic subgradient descent (Section 11-A). Once in the 
optimization framework we review the generalization to soft 
backpressure (Section II-C). In Section III, we construct the 
accelerated backpressure algorithm, which is an approximation 
of Newton's method using local information and limited com- 
munication with neighboring nodes. In Section IV, we prove that 



Algorithm 1: Backpressure at node i 



1 for t = 0,1,2, • 



do 



7 

8 end 



Observe local queue lengths {qi{i)}k for all flows k 
for all neighbors j £ rii do 

Send queue lengths {gf (t)}fc - Receive lengths {qj{t)}k 
Determine flow with largest pressure: 

kij = argmax ^qf{t) - 

Set routing variables to rfj (t) = for all fc k*j and 

r-!f^(i) = a,l{9f^(t)-9f(t)>0} 



fc,* 



end 



Transmit r^J^ {t) packets for flow k*j 



the accelerated backpressure algorithm stabilizes the queues. In 
Section V, we present numerical experiments which demonstrate 
the performance gains associated with implementing the accel- 
erated backpressure algorithm as compared to the backpressure 
algorithm from [1] and soft backpressure algorithm from [11]. 

The main contribution of this work is the introduction of a 
locally computable second order method for solving the queue 
stabilization problem. This is particular relevant for cases where 
the packet arrival statistics vary with time. As shown in Figure 
3, the accelerated backpressure algorithm can effectively stabilize 
queues in networks whose arrival rates vary at higher frequencies 
than its first order parent algorithms. 

II. Preliminaries 

Consider a given a network G = {V,£} where V is the set 
of nodes and £ C V x V is the set of links between nodes. 
Denote as Cij the capacity of link € £ and define the 

neighborhood of i as the set rij = {j S V\{i,j) G £} of nodes 
j that can communicate directly with i. There is also a set of 
information flows JC with the destination of flow k G K. being 
the node o/c e V. 

At time index t terminal i ^ Ok generates a random number 
a\{t) of units of information to be delivered to Ok- The random 
variables a^(t) > are assumed independent and identically 
distributed across time with expected value E [oj^(i)] = a^. In 
the same time slot node i routes r*j (t) > units of information 
through neighboring node j G rij and receives r'^^{t) > packets 
from neighbor j. The difference between the total number of 



received packets ^'ji(i) and the sum of transmitted 

packets X^^eni '"'iji^) added to the local queue - or subtracted 
if this quantity is negative. Therefore, the number (t) of fc-flow 
packets queued at node i evolves according to 



(1) 



where the projection [•]+ into the nonnegative reals is necessary 
because the number of packets in queue cannot become negative. 
We remark that (1) is stated for all nodes i 7^ because packets 
routed to their destinations are removed from the system. 



To ensure packet delivery it is sufficient to guarantee that all 
queues + 1) remain stable. In turn, this can be guaranteed 
if the average rate at which packets exit queues do not exceed 
the rate at which packets are loaded into them. To state this 
formiilly observe that the time average limit of arrivals staisfies 
Vmvt^oo = E [af(t)] := and define the ergodic limit 

:= limt_>.oo rfjli). If the processes r^^it) controlling the 
movement of information through the network are asymptotically 
stationary, queue stability follows if ^ 



E 



y k,ij^ Ok- 



(2) 



For future reference define the vector r := {rij}k,i^ok,j grouping 
variables for all flows and links. Since at most Cij packets 
can be transmitted in the link the routing variables r-^ (f) 
must satisfy 

J24it)<Cij. (3) 

fe 

The joint routing and scheduling problem can be now formally 
stated as the determination of nonnegative variables rfj{t) > 
that satisfy (3) for all times t and whose time average limits rf^ 
satisfy (2). The BP algorithm solves this problem by assigning all 
the capacity of the link {i, j) to the flow with the largest queue 
differential qf{t) — qj{t). Specifically, for each link we determine 
the flow pressure 



argmax [q^ {t) - q^ {t)]^ . 



(4) 



If the maximum pressure max/j [q^ (t) — q^ (i)] > is strictly 
positive we set rfj{t) = Cij for k = k*j. Otherwise the link 
remains idle during the time frame. The resulting algorithm is 
summarized in Algorithm 1. The backpressure algorithm works 
by observing the queue differentials on each fink and then 
assigning the capacity for each link to the data type with the 
largest positive queue differential, thus driving the time average 
of the queue differentials to zero- stabilizing the queues. For 
the generalizations developed in this paper it is necessary to 
reinterpret BP as a dual stochastic subgradient descent as we 
do in the following section. 

A. Dual stochastic subgradient descent 

Since the parameters that are important for queue stability are 
the time averages rfj of the routing variables r^- {t) an alternative 
view of the joint routing and scheduling problem is the determi- 
nation of variables r^j satisfying (2) and J2k '''ij — ^ij- ^^^^ 
be formulated as the solution of an optimization problem. Let 
fiji^tj) arbitrary concave functions on M+ and consider the 
optimization problem 

r* —argmax ^ /^(rf^.) 

<Cij, y{i,j)G£. 



(5) 



keic 



Stability is guaranteed only if the inequalities hold in a strict sense, i.e., 
Yljen ^ij ~ ^ji > '^i- Equality is allowed here to facilitate connections with 
optimization problems to be considered later on. 



where the optimization is over nonnegative variables > 0. 
Since only feasibility is important for queue stability, solutions to 
(5) ensure stable queues irrespectively of the objective functions 

f ij i^ij ) ■ 

Since the problem in (5) is concave it can be solved by 
descending on the dual domain. Start by associating multipliers 



with the constraint J2 



> a'l and keep the 



constraint r^^ < Cij implicit. The corresponding Lagrangian 
associated with the optimization problem in (5) is 



(6) 



where we introduced the vector A := {\^}k.i^o^ grouping 
variables A*^ for all flows and nodes. The corresponding dual 
function is defined as h{X) := max^-^ r?" <Cij ^{'''i 

To compute a descent direction for h{X) define the pri- 
mal Lagrangian maximizers for given A as r^j(A) := 
argmax^^ rk,<Cij '^(^' ^ descent direction for the dual func- 
tion is available in the form of the dual subgradient whose 
components (A) are obtained by evaluating the constraint slack 
associated with the Lagrangian maximizers 
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(A) 



(7) 



Since the Lagrangian C{r, A) in (6) is Unear in the routing 
variables rf^ the determination of the maximizers r^j{X) := 
argmax^^ ^fc £(r, A) can be decomposed into the maxi- 
mization of separate summands. Considering the coupling con- 
straints J2k """ij — ^iJ suffices to consider variables {rfj}k for 
all flows across a given link. After reordering terms it follows 
that we can compute rfj (A) as 



r'yj (A) = argmax ^ /f^- (4) + rf^- (A,^ - A^^) 



(8) 



s.t. 



fee/c 



Introducing a time index t, subgradients g^{\{t)) could be 
computed using (7) with Lagrangian maximizers r^j(X{t)) given 
by (8). A subgradient descent iteration could then be defined to 
find the variables r* that solve (5); see e.g., [19]. 

The problem in computing g^{X) is that we don't know the av- 
erage arrival rates . We do observe, however, the instantaneous 
rates a^{t) that are known to satisfy E [a*^(t)] = a*^. Therefore, 



(9) 



is a stochastic subgradient of the dual function in the sense that 
its expected value E [gf (A)] = gf{X) is the subgradient defined 
in (7). We can then minimize the dual function using a stochastic 
subgradient descent algorithm. At time t we have multipUers X{t) 
and determine Lagrangian maximizers r*j(t) rfj{X{t)) as per 
(8). We then proceed to update multipliers along the stochastic 
subgradient direction according to 



Xt{t + 1) 



(10) 



Algorithm 2: Dual stochastic subgradient descent 



do 

for all neighbors j G n{i) do 

Send multipliers {\i{t)}k - Receive multipliers {Xj{t)}k 
Determine flow with largest dual variable differential: 

kij = argmax j^Af (t) — Xj (t) j 

Set routing variables to rf^ (t) = for all k ^ k*j and 

r^(t) = a^l{A'^(0-A^^(i)>0} 

k* ■ 

Transmit r^^' (t) packets for flow kij 
end 

Send variables {rij{t)}kj - Receive variables {'r'ji{t)}kj 
Update multipliers {Xi{t)}k along stochastic subgradient 



Af(i + 1) = 



xUt)-e( ^ rUt) + rUt)+ant)) 



9 end 



where e is a constant stepsize chosen small enough so as to ensure 
convergence; see e.g., [11]. 

Properties of the descent algorithm in (10) vary with the 
selection of the functions .fij{r^j). Two cases of interest are when 
fijirij) — and when fijirij) are continuously differentiable, 
strongly convex, and monotone decreasing on M+ but otherwise 
arbitrary. The former allows an analogy with the backpressure as 
given by Algorithm 1 while the latter leads to a variation termed 
soft backpressure. 

B. Backpressure as stochastic subgradient descent 

The classical backpressure algorithm, [1] can be recovered 
by setting = for all hnks flows k and links 

With this selection the objective to be maximized in (8) becomes 

Sfe ^fj (Af (i) — A^(t)). To solve this maximization it suffices to 
find the flow with the largest dual variable differential 



/c^= argmax [A,^(f)-A,^(i)]+. 



(11) 



If the value of the corresponding maximum is nonpositive, i.e., 
maxfe [X^{t) - X'^{t)] < 0, all summands in (8) are nonpositive 
and the largest objective in (8) is attained by making r![j{t) = 
for all flows k. Otherwise, since the sum of routing variables 
rfj{t) must satisfy J2keK.''^ij — ^ij maximum objective is 
attained by making rf^ {t) = Cij for k = k*^ and (t) = for 
all other k. 

The algorithm that follows from the solution of this max- 
imization is summarized in Algorithm 2. The dual stochastic 
subgradient descent is implemented using node level protocols. 
At each time instance nodes send their multiphers Af(f) to 
their neighbors. After receiving multipUer information from its 
neighbors, each node can compute the multiplier differentials 
Xf{t) — Xj{t) for each edge. The node then allocates the full 
capacity of each outgoing Unk to which ever commodity has 
the largest differential. Once the transmission rates are set each 
node can observe its net packet gain which is equivalent to the 
stochastic gradient as defined in (9). Finally, each node updates 



Algorithm 3: Soft Backpressure at node i 

1 for t = 0, 1, 2, • • • do 

2 for all neighbors j G n{i) do 

3 Send multipliers {\i{t)}k - Receive multipliers {\jit)}k 

4 Compute i^ij such that 

Y^[xUt)-X'j{t)-Hi,]+ 

k 

Transmit packets at rate 

end 

Send variables {rij(t)}kj - Receive variables {rji{t)}kj 
Update multipliers {Af (t)}fc along stochastic subgradient 



xUt + i) = 



XUt)-e(j2'''oit)-r'i{t)-c^iit) 



Proof: The dual problem niinA h{X) is convex because the 
primal (5) is convex. Convex optimization problems have unique 
maximizers. In our case, the dual gradient comes from differen- 
tiating (6) with respect to A*^ which yields (9). Substituting the 
unique maximizers we have the unique dual gradient. Since h{\) 
has a unique gradient, it is differentiable. 

In order to find the previously mentioned unique maximizers 
we consider the primal optimization (8). We duaUze the addi- 
tional constrain to get the extended Lagrangian 

fc fc 

Considering the KarushKuhnTucker (KKT) optimality conditions 
as defined in [20][Section 5.5] for (15) yields the equations 



9 end 



[A-' - A*' - Hij] 



(16) 
(17) 



its multipliers but subtracting e times ties stochastic subgradient 
from its current multipliers. Choosing the stepsize e = 1 causes 
the multipliers to coincide with the queue lengths for all time. 

Comparing (4) with (11) we see that the assignments of 
flows to links are the same if we identify multipliers Af(<:) 
with queue lengths. Furthermore, if we consider the Lagrangian 
update in (10) with e = 1 we see that they too coincide. Thus, 
we can think of backpressure as a particular case of stochastic 
subgradient descent. From that point of view algorithms 1 and 
2 are identical. The only difference is that since queues take the 
place of multipliers the update in Step 8 of Algorithm 2 is not 
necessary in Algorithm 1. 

C. Soft backpressure 

Assume now that the functions fij{rfj) are continuously 
differentiable, strongly convex, and monotone decreasing on M_|_ 
but otherwise arbitrary. In this case the derivatives dfjj{x)/dx of 
the functions fij{x) are monotonically increasing and thus have 
inverse functions that we denote as 



Fh{x):= [df^j{x)/dx] \x). 



(12) 



The Lagrangian maximizers in (8) can be explicitly written 
in terms of the derivative inverses F^j{x). Furthermore, the 
maximizers are unique for all A implying that the dual function 
is differentiable. We detail these two statements in the foUowing 
proposition. 

Proposition 1 If the functions fijir^j) in (5) are continuously 
differentiable, strongly convex, and monotone decreasing on M+, 
the dual function h{X) := max^--^ r>'<Cij '^('"i differentiable 
for all A. Furthermore, the gradient component along the A^ 
direction is g^iX) as defined in (9) with 

4{X) = 4 (- [A,^ - X^ - ft,(A)]+) . (13) 

where /iy(A) is either OifY,kFU - [^i " ^f]"^) ^ 
chosen as the solution to the equation 

Y^F^i (- [A,' - A,^ - Mii(A)]+) = Q,. (14) 



for all G £. Applying the definition of F^^{-) from equation 
(12) to (16) we get the desired relation in (13). It remains to 
enforce (17) by selection of which gives us condition (14). 
The assertion that /iy = when Y.k - [Af - A*^]"^) < C,, 
hold by the principal of complementary slackness, detailed in 
[20] [Section 5.5]. ■ 
While (14) does not have a closed for solution it can be computed 
quickly numerically using a binary search because it is a simple 
single variable root finding problem. Computation time cost 
remains small compared to communication time cost. 

This yields the Algorithm 3. Sott backpressureis implemented 
using node level protocols. At each time instance nodes send their 
multipliers A*^(t) to their neighbors. After receiving multiplier 
information from its neighbors, each node can compute the 
multipUer differentials Aj^(t) — Xj{t) for each edge. The nodes 
then solve for /i^^ on each of its outgoing edges by using a 
rootfinder to solve the local constraint in (14), The capacity 
of each edge is then allocated to the commodities via reverse 
waterfilling as defined in (13). Once the transmission rates are 
set each node can observe its net packet gain which is equivalent 
to the stochastic gradient as defined in (9). Finally, each node 
updates its multipliers but subtracting e times ties stochastic 
subgradient from its current multiphers. Choosing the stepsize 
e = 1 causes the multiphers to coincide with the queue lengths 
for all time. 

Algorithm 3, is called soft backpressure because rather than 
allocate all of an edges capacity to the data type with the largest 
pressure X^ — X^, it divides the capacity among the different 
data types based on their respective pressures via inverse water 
filling, [11]. We further observe that soft backpressure is solved 
using the same information required of backpressure, the lengths 
of the queues for all data types on either end of the link whose 
flow you are computing. This means that assuming a step size 
e = 1, Algorithm 3 can be implemented without computing the 
multiphers but rather by simply observing the queue lengths qf (t) 
which are equal to the dual variables A^(t). 

An important difference between backpressure and soft back- 
pressiu'e is that the former is equivalent to stochastic subgradient 
descent whereas the latter is tantamount to stochastic gradient 



Algorithm 4: Newton's Method 



1 for t = 0, 1, 2, • • • do 
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for all nodes j € n{i) do 

Send multipliers {Xi{^)}k - Receive multipliers {X'j{t)}k 
Compute i^ij such that 

k 

Transmit packets at rate 

4W = F(-[A^(t)-A,^(i)-Md+) 

Send variables {rfj(t)}kj- Receive variables {rji{t)} 
Compute Hij = [V^£(A, r{X))]ij 

end 

Comptute stochastic subgradient gi = {gi}k 



kj 



for all nodes j £ V do 
Send variables Hij, gi- 



Receive variables Hji, gj 



Compute Hessian inverse blocks [H~^]i 
end 

Update multipliers Xi{i) = {Xi{t)}k along the Newton 
direction 



Ai(t + 1) = 



Xi{t) - Y,[H-%gj 



15 end 



and the diagonal blocks are Ha are positive semidefinite. 

Proof: In order to compute the dual Hessian, we begin by 
computing the optimal flow rates rf^ (A) from the optimal queue 
priorities as defined in (13). Substituting into 



dC{r{\),\) 



(21) 



and differentiating with respect to A we construct the Hessian. 
Since is a constant the key is differentiating r^j(A) with 
respect to A. We can differentiate (13) using the chain rule 
yielding 



dX^ dx dX^ 



(- [A? 



A 



(22) 



The existence of dF^Ax)/dx is guaranteed by our assumptions 
on the edge costs f^y Differentiating — [A^ — Aj^ — /iy] is 
done by observing that this function becomes a saturated linear 
function of when is non-zero and exactly zero when 
rfj = 0. The slope of the linear function is deterrmned by whether 
the edge (z, j) is at capacity (pij > 0). Discontinuities in 
occur when rf^ = and when the projection becomes active, 
resulting in points from the derivative. We define the derivatives 
to be zero out these points since they are adjacent to regions 
where the derivatives are defined and equal to zero. We express 
the derivative in terms of the active sets 



descent because the dual function is differentiable. This improves 
the average convergence rate from logarithmic - expected dis- 
tance to optimal variables decreasing like c/t for some constant 
c - to linear - expected distance to optimality proportional 
to c* for some c. Linear convergence is still not satisfactory, 
however, motivating the accelerated backpressure algorithm that 
we introduce in the following section. 

111. Accelerated Backpressure 

Backpressure type algorithms exhibit slow convergence be- 
cause they are fundamentally first order methods. In the cen- 
tralized deterministic case we can speed up using the Newton 
method. In our case, the stochastic nature of the problem is not 
an issue because the arrival rates do not appear in the Hessian 

Hll Hi2 ■ ■ ■ Hin 



H = 



H. 
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where 



ij\ks 



d^C{r{X),X) 
dX>ydX' 



(19) 



Assuming knowledge of H and the gradient g = {gf} for aU i 
and k, the Newton algorithm is given by Algorithm 4. 

Proposition 2 The Dual Hessian, H{X) = V^£(r(A),A) is 
block sparse with respect to the graph Q: 



Hij=0 s.t{i,j),{j,i)^S 



(20) 



A, = {fce/C:4(A)>0} 



(23) 



for each edge (i, j) and the local variables ^lij and {fij}- Using 
(22) the diagonal elements Hn are defined 

[Hu]kk = E e A.-) C^^l^ - l) 



1 . (24) 



Using same method the off diagonal elements of Hu are com- 
puted 



[Hiihs = ^F'(4)i(fc,se A 



+ F'{r^,)l{k,seAji) 



> 0) 

I 



|./4ji| 



(25) 



(18) We observe that Y^j^ml^ijhk = [Hii]kk from the definitions 
in (24) and (25). Also, F'{r^j) < from (12) and the our 
assumption that fij{-) is monotone decreasing, therefore the 
diagonal elements are positive and according to [21] [Section 6.2] 
Ha is positive semidefinite. 

We can also compute the off diagonal blocks by differentiating 
using (22). The diagonal elements of the the off-diagonal blocks 
Hij are given by 



ij\kk 



+ F'{r%)l{kGAji){l- 



\A.ij I 
> 0) 



(26) 



and the off-diagonal elements of the off-diagonal blocks are 

- F'(r^,)l(fc,seA-.)('^'g^). (27) 

Observe that the action set Aij is empty by definition if there 
is no link G £, therefore k cannot be an element of Aij. 

Plugging this fact into the definition of the off diagonal blocks 
Hij foimd in equations (26) and (27) we conclude that Hij = 
whenever i j and {i, j), (j, i) ^ £. ■ 

Proposition 2 guarantees that all elements of the Hessian can 
be computed using local information. Elements of the Hessian 
require knowledge of the local action sets Aij from (3), which 
are computed using the local flow values {r'-j}k- Furthermore, 
Proposition 2 gives us positive semi-definiteness of the diagonal 
blocks Hii indicating that the nodes depend positively on their 
own queues. This fits our intuition because we expect penalties 
on a specific queue to become larger when those queues become 
larger. Later we will use this fact to show that our splitting matrix 
D is invertable. 

Algorithm 4 is centralized because it requires the inverse of the 
Hessian matrix. For comparison with our algorithms we describe 
node level protocols for implementing Newton's method for the 
backpressure problem. At each time instance nodes send their 
multipliers A* (f) to their neighbors. After receiving multiplier 
information from its neighbors, each node can compute the 
multipUer differentials Af (t) — A^(t) for each edge. The nodes 
then solve for /i^ on each of its outgoing edges by using a 
rootfinder to solve the local constraint in (14), The capacity 
of each edge is then allocated to the commodities via reverse 
waterfilling as defined in (13). Once the transmission rates are 
set each node can observe its net packet gain which is equivalent 
to the stochastic gradient as defined in (9). The nodes must also 
compute the matrix of second derivatives, Hij for each of its 
outgoing edges according to (24)-(27) which are functions of 
local transmission rates {?'fj}/cjen(i)- In order to compute the 
Newton direction nodes must then send their sub gradients and gi 
and all of their Hessian blocks H^j to every node in the Network, 
With this information each node can invert the global Hessian 
matrix and compute its Newton direction di = —^j[H^^]ijgj. 
Finally, each node updates its multipliers by adding to its 
current multipliers. In this algorithm the multipliers are no longer 
equivalent to the queue lengths but can be thought of as "queue 
priorities" which take into account the queue lengths and the 
networks ability to handle this queues. 

A. Distributed approximations of Newton steps 

In order to accelerate backpressure and retain its distributed 
nature we compute the dual update direction based on the ADD- 
N algorithm defined in [22] which leverages the sparsity of 
the Hessian to approximate its inverse. The ADD-N algorithm 
requires a splitting of the Hessian. We define a block splitting 
H = {D + I) — {B + 1) where D is block diagonal defined by 

Da = Hii (28) 

and B is block sparse defined by 

B = D-H. (29) 



The ADD-N method gives us the approximate Hessian inverse 

N 

H-^ « fl-W =Y^({D + I)-'^{B + I)y {D + (30) 

The diagonal blocks [D + Ija are positive definite for all i 
because Da is positive semi-definite from Proposition 2. We 
define simplified notation for our splitting 

H = D-B (31) 

where D = D + 1 and B = B + I giving us the expression 

N 

H^^) =Y^{D-^By D-\ (32) 

T=0 

By construction, the matrix H'^'^'> is block sparse such that 
h\^'^ > only if node i and j are N or fewer hops apart 
in Q. We are interested in the fully distributed case where only 
one hop neighbor information is required. Selecting N = 1, the 
dual update direction is computed as 

di{t) = -DT^Qiit) - DT^BijDT^^gj{t). (33) 

The update direction d'l can be computed using only local 
information acquired by a single exchange of information with 
neighboring nodes as described in Algorithm 5. Each node uses 
knowledge of the flows entering {rj^j}/c and leaving {r^^ jfe to 
compute the active sets Aij. Using the active sets and the flows 
node i can compute the Hessian submatrix Hij for each of its 
neighbors j G n{i). The sub matrix Ha = — J2jen{i) ^n- From 
the Hessian submatrices node i can compute the submatrices of 
our splitting Bij = —Hij, Ba = I, Da = I + Ha. The gradient 
{9i}k is compute from the flows {rfj}k and {rj^jjft- Node i can 
invert D locally because it is block diagonal. Node i exchanges its 
local variables D~l^ and gi = {.gf }fe with its neighbors j G n{i) 
allowing the computation of the approximate newton direction 
at node i, di given by equation (33). The dual update for the 
accelerated backpressure algorithm is given by 

\i{t+l) = \i{t)-J2Hij9j (34) 

where [Aj]fc = is the local vector of duals at node i and Hij 
is the i,j block of the matrix H. The dual updates for variables 
belonging to node i depend only on the variables and gf 
belonging to nodes j G rii. The accelerated backpressure algo- 
rithm is given by Algorithm 6 works like soft backpressure but 
the set of queue priorities Af (<) are not equivalent to the queue 
lengths q^. The queue priorities are updated using information 
about not only node i's queue but also the queues at neighboring 
nodes j G n,. The addition information is weighted according 
the approximate inverse Hessian. The cost of this change is that 
rather than simply observe the queues as they evolve we must 
observe the realized flows r^j and update the queue priorities 
according. This requires one additional exchange of information 
with neighboring nodes, which appears in Algorithm 5. However 
the slightly increased communication overhead of the accelerated 
backpressure algorithm is offset by significant improvement in 
convergence rate which we demonstrate in the following sections. 



Algorithm 5: Local Approximation of Newton's Direction 
1 for all neighbors j € n{i) do 

2 
3 



Observe local flows {rij}k and {rji}k 
Compute active sets 



{kefC: n,{X) > 0} 



Compute Hij according to (26) and (27). 



Set Bij — —Hij, 



6 end 

7 Compute Hii according to (24) and (25). 

8 Compute Dr.^ = {Hu + 

9 Set Bii = I 

10 Compute gi = {gi}k according to 

9i = r%{t) - r^j{t) - aUt) 

11 for all neighbors j G n{i) do 

12 
13 

14 end 

15 Compute di = {di}k according to 

d. 



Communicate D-.^ and gi 
Receive D^^ and gj 



D^9r - E D^BijDJ^gj 



Algorithm 6: Accelerated Backpressure for node i 

1 Observe q^{0). 

2 Initialize A* (0) = qf (0) for all A; and i dest{k) 

3 for t = 0,1,2, - •■ do 

4 for all neighbors j G n(t) do 

5 Send multipliers {\i{t)}k - Receive multipliers {\jit)}k 

6 Compute fj,ij such that 

YlQiit) - 1j{t) + I3ij - 

k 

Transmit packets at rate 



8 
9 
10 



11 
12 



end 

Send variables {^fj(i)}fej - Receive variables {r'ji{t)}kj 
Compute stochastic gradient {gi(t)}k 

Compute di by executing Algorithm 5 
Update the dual variables 

A,'(t+l) = A*=(t) + ed^(t) 



13 end 



IV. Stability Analysis 

In order to claim the accelerated backpressure algorithm is an 
alternative to the backpressure and soft backpressure algorithms 
we need to guarantee that it achieves queue stabihty. In this 
section we leverage results from the the stability analysis of 
the backpressure algorithm detailed extensively in [2]. We begin 
by proving the stability of the dual variables because they are 
analogous to queue lengths in the backpressure analysis. 



Proposition 3 Given a Network Q with packet arrivals 
E[o^(t)] = and edge capacities dj, routing packets according 
to Algorithm 6, we have stable dual variables 



limsuplE5^EAf(T)<^ 



(35) 



T=0 i,k 

for each commodity k and node i ^ dest{k), where B,d > 0. 

Proof: Define the Lyapunov function for the dual vector 
sequence Aj = {^i{t)}i_k for all t > 0, 



L{Xt) = KHt'^t 



(36) 



where Ht = H{Xt). We consider the variation in At = H^^i — 
Ht^. Due to practical constraints on the flow rates r'^j{t) defined 
in (8), and the construction of Ht from (32) with iV = 1, it is 
clear that there exists an upper bound of the singled iteration 
change Aj. We wiU characterize this bound as x' J^tx < b for 
aU t. Using the definition of A^ we can write the change in the 
Lyapunov function 

- L{Xt) = A;+i[5-i + At]Xt+, - KH^'Xt (37) 

Reorganizing terms we have 

L{Xt+i) - L{Xt) = K+i^tXt+i + (At+i - XtYHrHXt+i + At) 

(38) 

Applying equation (34) to substitute Af+i — At = —HtQt and 
simplifying, 

L(At+i) - L{Xt) = A;+iAtAt+i + (-flt)'(2At - Ht9t). (39) 

We next add and subtract /(r) — ^ ■ - j^ fij{rfj) in order to make 
the primal objective from (8) appear, 

L(At+i) - L(At) = A;+iAtAt+i + <7t5t<?t + 2/(r)' 

-2{g[Xt + /(r)) (40) 

where gt is the gradient as defined in (9). Since the flow values 
lie in bounded set defined by the capacities gtHtgt and /(r) both 
have finite maximums. Along with our finite bound x'AiX < b 
we can conclude Aj^j^A; Af+i + gtHtgt + '2f{r) < B for some 
B > Q. We apply this inequality and take the expectation with 
respect to It which is the sigma field including all relevant 
information up to time t including the dual variables, queue 
lengths and packet transmission rates: 

E[L(At+i) - L(At)|It] <B- E[2(5;At + f{r))\It]. (41) 

We now applying the argument from [2][Theorem 4.5] with 
[2] [Corollary 3.9], which (to summarize) uses the fact that r = 
{vij} are the primal optimizers and a small perturbation (5 > 
to the arrival rates {af } to lower bound E[(5itAt + f{r))\It] > 
^T,i,k ^iii)- Thus we have 

E[L(At+i) - i(At)|Xt] <B- 26j2^i{t), (42) 

the necessary condition for the Lyapunov Stability Lemma, 
[2][Lemma 4.1] which guarantees the desired relation. ■ 

Corollary 1 Stability of the dual variables Xf {t) implies stability 
of the queues ql{t). 




Fig. 1. The numerical experiments for tlie ABP algoritfim presented in this 
section are performed on tliis simple 10 node network with 5 data types. The 
destinations are unique for each data type and are chosen randomly. 

Proof: The queues evolve according to (1) while the duals 
evolve according to (34). Both updates are based on the realiza- 
tion of the sequence of subgradients in (9), the difference being 
the matrix product Htgt appearing in (34). From Proposition 1, 
we have the Hessian diagonal blocks Ha are positive definite and 
H is block diagonally dominant. By the construction in (32), we 
can see H also has positive definite diagonal blocks and block 
diagonal dominant. Therefore, the sequence \^{t) only remains 
stable if (2) is satisfied, which is the condition for the stability 
of the queue q^{t). ■ 
Proposition 3 guarantees the stability of the dual variables \^{t) 
(queue priorities) which take the role the queues have in BP and 
SBP for determining routing in the ABP algorithm. Demonstrat- 
ing that the \\{t) sequence is stable tells us that the algorithm 
itself is stable in the sense that the routing assignments stabilize 
(because they are an explicit function of X^{t)). It is Corollary 
1 which guarantees that the queues themselves remain stable. 
This is done by observing that the evolution of \^{t) and q^{t) 
both evolve based on the sequence of dual gradients g^{t). Since 
the ABP algorithm solves the optimization (5), the dual gradient 
(t) tends to zero on average so we are not surprised to see that 
both Af (t) and q^{t) are stable. 

V. Numerical Results 

The ABP algorithm can be demonstrated to outperform back- 
pressure and soft backpressure in numerical experiments. The 
following examples are run using the 10 node network shown in 
Figure 1 with 5 data types. Our choice of objective function is 

+ (43) 

The quadratic term captures an increasing cost of routing larger 
quantities of packets across a link and help to eliminate myopic 
routing choices that lead to sending packets in cycles. The linear 
term /3 is introduced to reward sending packets to their destina- 
tions. In our simulations, j3^j = 10 for all edges routing to their 
respective data type destinations j = dest{k) and all other k, 
/3^j = 0. The Link capacities are select uniformly randomly 
[0,100]. We demonstrate that the ABP algorithm outperforms 
backpressure and soft backpressure when the arrival rates are 
stochastic. Numerical experiments are formulated with the the 
average arrival rates are 5 packets per data type per node. The 
nodes do not know that this is the average and therefore use the 
current realized arrival rate when computing their transmission 
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Fig. 2. Even when the anival rates are unknown, the Accelerated Backpressure 
algorithm stabilizes the queues faster and with smaller queue totals than the Soft 
Backpressure or the traditional Backpressure algorithms 
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Fig. 3. The variation in the underlying statistics for the packet arrival rates 
causes the backpressure and soft backpressure algorithm to stabilize much more 
slowly and in some cases not at all. 



rates. As shown in Figure 2, the ABP algorithm still stabilizes the 
queues after about 30 iterations while soft backpressure requires 
100 and backpressure requires around 150 iterations but retains 
the most volatility. The ABP queues stabilize with around 1000 
queued packets while soft backpressure has about 6000 queued 
packets and the traditional backpressure algorithm has around 
8000 queues packets. In addition to having the smallest queues 
we observe that the accelerated backpressure algorithm also has 
the smallest variance in the queues. 

In order to demonstrate the effect of having a faster conver- 
gence rate we consider the case were the nodes are divided into 
two equal sized sets. At time t one of the two sets will be actively 
receiving packets. The frequency at which we switch between 
active sets captures how fast the system is varying. The example 
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Fig. 4. The Accelerated Backpressure algorithm converges to the optimal dual 
variables at a rate significantly greater than the backpressure or soft backpressure 
algorithm, allowing the algorithm to track the optimal duals even when the 
underlying arrival rates vary. 



shown in Figures 4 and 3 switches active sets every 10 iterations. 
Otherwise it is equivalent to the previous examples where 10 
nodes were used with the topology shown in Figure 1. The key 
observation in 4 is that the dual variables are highly volatile for 
backpressure and soft backpressure. Recalling that in the case of 
backpressure and soft backpressure the duals are equivalent to the 
queue lengths, volatility in the duals is volatility in the queues. 
However, accelerated backpressure converges quickly enough that 
the variations in the underlying statistics at a rate of once every 
10 iterations has no significant effect. From Figure 3 it is clear 
that by having much more stable dual variables we are able to 
compute more effective transmission rate assignments and keep 
the queues from growing. 

VI. CONCLUSION 

In this work we have presented a novel method for computing 
packrouting in networks based on applying an approximate 
Newton's method to the backpressure routing problem. This 
approach retains the distributed information structure necessary 
for implementing the algorithm efficiently at the node level. We 
presented node level protocols and proved that our algorithm 
stabilizes queues provided the arrival rates and capacities are 
chosen such that it is possible to stabilize the queues. In nu- 
merical experiments we demonstrate significant improvements in 
convergence rate leading to much smaller queues and the ability 
to stabilize queues even when the arrival rate statistic vary. Our 
forthcoming work focuses on extending the analysis of the ABP 
algorithm to explicitly consider the convergence rates. 
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